Automatic Product Classification and Clustering Solutions in a Retail Context
نویسندگان
چکیده
In this report we propose a methodology to automatically classify products and cluster similar products together. This enhances user interaction and performance metrics for e-commerce (product shopping) websites, with applications in product search, site navigation, product comparison, etc. For this project, a variant of the multinomial Naive Bayes algorithm was used for classification. We present our findings of key aspects that demonstrate its accuracy. A clustering methodology was developed to handle large data-sets, overcoming the N 2 complexity, in a two-step approach. The first step of clustering utilizes Locality Sensitive Hashing, the second step uses a Shingles matching method. We describe a methodology that allows trading cluster-size versus similarity of items within a cluster and propose a way to use clustering to increment classification accuracy.
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملA CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images
Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...
متن کاملAutomatic Detection and Localization of Surface Cracks in Continuously Cast Hot Steel Slabs Using Digital Image Analysis Techniques
Quality inspection is an indispensable part of modern industrial manufacturing. Steel as a major industry requires constant surveillance and supervision through its various stages of production. Continuous casting is a critical step in the steel manufacturing process in which molten steel is solidified into a semi-finished product called slab. Once the slab is released from the casting unit, th...
متن کاملCooperative Advertising and Pricing in a Supply Chain: A Bi-level Programming Approach
Nowadays, coordination between members in a supply chain has become very important and beneficial to channel members. Through cooperative advertising, manufacturers and retailers can jointly participate in promotional programs. This action not only reduces the cost of advertising, but also is important to create a link with local retailers in order to increase immediate sales at the retail leve...
متن کاملAutomatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems
With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013